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Abstract 

The  idea  of  doing  a  curvilinear  search  along  the  Levenberg-Marquardt 
path  s(fi)  =  —  (H  +  always  has  been  appealing,  but  the  cost 

of  solving  a  linear  system  for  each  trial  value  of  the  parameter  fi  has 
discouraged  its  implementation.  In  this  paper,  an  algorithm  for  search¬ 
ing  along  a  path  which  includes  s(/i)  is  studied.  The  algorithm  uses 
a  special  inexpensive  QTCQT  to  QT+Qt  Hessian  update  which  trivial¬ 
izes  the  linear  algebra  required  to  compute  s(fi).  This  update  is  based 
on  earlier  work  of  Dennis-Marwil  and  Martinez  on  least-change  secant 
updates  of  matrix  factors.  The  new  algorithm  is  shown  to  be  local 
and  q-superlinearily  convergent  to  stationary  points,  and  to  be  globally 
q-superlinearily  convergent  for  quasi-convex  functions.  Computational 
tests  are  given  that  show  the  new  algorithm  to  be  robust  and  efficient. 
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1  Introduction 


In  this  paper,  we  consider  iterative  methods  for  solving  the  smooth  uncon¬ 
strained  minimization  problem: 

mjn/(x);  /  :  ft  C  IR"  -»■  IR;  /eCx(fl), 

for  fl  open  in  IR".  We  denote  g(x)  =  V/(:r)  for  all  x  £  tt.  We  will  use  the 
norm  whenever  another  norm  is  not  indicated. 

Our  methods  are  based  on  the  common  notion  of  choosing  a  trial  step  from 
the  current  iterate  xc  to  the  next  iterate  based  on  a  local  quadratic  model 
of  f(xc  +  s)  —  f(xc )  of  the  form: 

1  rri  _  m 

qc{s)  =  gcs  +  -s  Hcs,  where  gc  =  Vf(xc)  and  Hc  =  Hc  .  (1.1) 

Our  methods  belong  to  a  class  often  called  curvilinear  search  methods,  and 
the  curvilinear  path  we  search  along  is  the  same  one  in  IR"  from  which  the 
trust-region  method  based  on  the  same  model  would  choose  its  step.  The 
major  difference  from  trust-region  methods  is  that,  even  if  we  eventually  choose 
the  same  trial  step,  we  do  our  search  based  on  the  ‘Levenberg-Marquardt’ 
parameter  rather  than  on  the  length  of  the  step.  Methods  based  on  other 
curvilinear  paths  have  been  published,  but  since  none  are  in  general  use,  we 
omit  any  comparative  discussion.  Most  relevant  is  that  Schramm  and  Zowe 
[11]  in  their  B-T  algorithm  for  nonsmooth  optimization  search  the  analogous 
curve. 

The  key  to  the  practicality  of  the  particular  method  we  test  is  that  we 
build  the  local  model  (1.1)  in  a  form  that  trivializes  the  linear  algebra  needed 
to  compute  any  trial  step  along  the  search  path.  For  example,  standard  ap¬ 
proaches  would  require  a  Cholesky  factorization  at  each  trial  step,  but  we  need 
only  solve  a  tridiagonal  system  and  do  two  matrix-vector  products. 

This  paper  is  organized  as  follows:  Section  2  contains  a  global  convergence 
analysis  in  which  we  assume  that  the  sequence  of  model  Hessians  is  bounded, 
but  we  do  not  specify  how  the  Hessians  are  to  be  chosen.  We  define  the  set 
from  which  a  trial  step  must  be  chosen  that  satisfies  an  Armijo  criterion.  We 
show  that  there  are  steps  in  the  set  that  satisfy  the  sufficient  decrease  criterion, 
but  we  do  not  specify  how  the  step  is  to  be  found. 

In  Section  3,  we  assume  that  V2/  is  Lipschitz  continuous  on  f l,  and  we 
present  a  new  least-change  secant  method  for  defining  H+  from  Hc  and  apply 
the  results  of  Section  2  to  the  resulting  algorithm.  This  method  is  in  the  spirit 
of  [2],  [7],  [5]  in  that  there  is  never  any  need  to  form  H+.  Instead,  Hc  is  held 
in  the  form  QqTcQq  ,  Q0  orthogonal,  Tc  tridiagonal,  and  H+  =  Q0T+Q „  is 
defined  by  doing  a  sparse  symmetric  secant  update  of  Tc  to  get  T+. 
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In  Section  4,  we  validate  the  new  update  by  giving  a  local  convergence 
analysis  of  the  corresponding  full  step  quasi-Newton  method  to  stationary 
points  of  /.  In  Section  5,  we  add  a  convexity  assumption  on  /  and  prove 
that  the  particular  method  from  Section  3  that  always  tries  the  Newton  step 
first  when  Hc  is  positive  definite  is  globally  q-superlinearly  convergent.  This 
order  of  convergence  result  is  no  better  than  we  could  prove  if  we  did  not  do 
the  updates,  but  the  updates  cost  a  low  multiple  of  n ,  and  they  are  certainly 
worthwhile  computationally,  as  is  shown  in  Section  7.3.  Section  6  discusses 
an  implementation  and  Section  7  gives  some  numerical  results  for  a  particular 
method  from  Section  3. 


2  The  General  Algorithm:  Global  Conver¬ 
gence 

In  this  section  we  state  a  general  algorithm  of  the  type  studied  here.  We  make 
the  algorithm  only  as  specific  as  necessary  to  prove  a  global  convergence  result. 

Given  x  £  Q,  H  a  symmetric  n  x  n  matrix,  Ai  =  A \{H)  the  smallest  eigen¬ 
value  of  H,  V\  the  corresponding  eigenspace,  we  define  a  curve  parameterized 
by  mu: 

Ti(x,H)  =  {x  -  ( H  +  fil)~1g(x)  :  0  <  p  >  -A!}  . 

If  g(x)  (f  Vtx,  or  if  Aj  >  0,  we  define  T(x,H)  —  Ti(x,H).  Otherwise,  we 
choose  v  €  Vi,  v  0  and  we  define  a  curve  parameterized  by  mu: 

T(x,H)  =  T1(x,H)ur2{x,H)  , 

where 

r2(a:,  H)  =  {x  —  (H  —  Ai I)+g(x)  +  gv  :  fi  £  1R}  . 

The  following  lemma,  which  follows  from  Gay  [4]  and  More-Sorensen  [8],  gives  a 
geometrical  meaning  to  T(x,  H).  It  shows  that  if  Aa  <0  and  if  g(x)  6  then 
any  v  €  Vi  gives  the  same  result  for  the  quadratic.  In  our  implementation,  we 
always  choose  trial  steps  that  stand  in  the  same  relation  to  the  current  iterate 
that  2  has  to  x  in  the  hypotheses  of  the  lemma.  However,  we  have  no  need  to 
be  so  specific  in  order  to  prove  global  convergence  in  the  next  section. 

Lemma  2.1  Let  x  €  Ll,  z  6  Y(x,H).  Then  z  is  a  minimizer  of 

q(w)  =  |( w  —  x)T H{w  —  x)  +  g(x)T(w  —  x)  subject  to  ||m  —  x||  <  \\z  —  x||  , 

and  the  direction  from  x  to  z  is  a  descent  direction  for  q.  Furthermore,  assume 
z  €  Ti(x,H),  then  z  is  the  unique  minimizer.  IfO<S<  \\z  —  x||,  then  there 
is  a  unique  w  €  T(x,H)  such  that  ||io  —  x||  =  8.  Also,  w  G  Ti(x,H). 
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Proof:  This  is  just  a  slight  restatement  of  a  standard  result  of  Gay [4]  and 
Sorensen.  For  example,  see  Lemma  2.3  of  More  and  Sorensen  [8].  □ 

The  following  algorithm  describes  the  way  of  obtaining  a  new  approxi¬ 
mation  x+  to  the  minimizer  of  /,  starting  from  a  current  approximation 
xc  €  fl  such  that  gc  ^  0  and  using  a  current  Hessian  approximation  Hc. 
A  large  positive  number  A  is  used  to  bound  the  steplength,  and  Ac  and 
Ac  are  constants  needed  in  the  convergence  proof.  The  algorithm  parame¬ 
ters  q  €  (0,  \),fi  (E  (0,1)  are  used  to  guarantee  sufficient  decrease.  We  use 
a  =  10~4  and  (3  =  macheps. 

Algorithm  2.1 


Given  Hc ,  xc ; 

If  A fiHc)  <  0;  Then  Ac  =  Ac  =  A; 

Else  s =  -H~1g{xe)\  Ac  =  Ac  =  min{A,  A||5^||}; 

Set  x  =  xc\ 

While  (x  =  xc  or  f(x)  >  f(xc)  +  ag(xc)T(x  -  xc))  DO 

Choose  x  €  r (xc,Hc)  such  that  j32 Ac  <  ||x  —  xc\\  <  Ac; 
Ac  =  Ac/2; 

ENDO; 

Set  x+  =  x ; 


Remark. 

Obviously,  the  efficiency  of  Algorithm  2.1  depends  on  the  way  x  is  selected. 
“Choose”  is  a  very  ambiguous  word  that  we  use  deliberately  to  show  that  many 
strategies  are  possible. 

Let  us  now  prove  that,  given  xc,  Hc,  with  gc  =  g(xc)  /  0,  Algorithm  2.1  is 
always  able  to  finish  by  finding  a  point  x  which  satisfies  the  sufficient  decrease 
condition 

f(x)  <  fc  + agj(x  -  xe).  (2.1) 

Theorem  2.2  After  a  finite  number  of  DO  loop  executions,  Algorithm  2.1 
obtains  a  point  x  =  x+  that  satisfies  (2.1). 

Proof: 
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We  only  need  to  prove  that,  if  ||x  —  rcc||  is  small  enough  and  x  €  T(:rc,  Hc ), 
then  (2.1)  is  satisfied.  Using  Lemma  2.1,  it  is  easy  to  see  that 


lim 


x  —  xc 


Ur  —  xr 


lim 


x  —  xr 


Ur  —  xr 


Size) 

M*c)\\  ’ 


(2.2) 


X-*XC  II  "'ll  x—>xc 

xer(xc,Hc)  i€Pi  (xc,Hc) 

since  if  ||x  —  xc||  is  small  enough,  then  x  6  Y\(xc,Hc).  Therefore,  using  (2.2) 
and  the  Mean  Value  Theorem,  we  have 

f(x)  -  f(xc)  _  g(xc  +  £(x  -  xc))T(x  -  xe) 

Hx-sJI  llx  —  a;c|| 


with  £  €  (0,1) 


Hence, 


lim  M  -m  .  g(Xc)T  lim 


Hie 

ser  (xc,Hc 


\X  —  Xr 


x-+xc 
seri  (xc,Hc) 


Ur  —  xr 


-Ib(^c)| 


<  -a||$(xc)||  < 


Ur  —  xr 


for  any  x  /  xc,  and  the  required  result  follows  from  this  inequality.  □ 

We  now  give  a  result  that  we  need  to  prove  global  convergence  of  Algorithm 

2.1. 


Lemma  2.3  Assume  that  \\Hk\\  <  B  for  k  =  0,1,2,...  and  lim k-+ooXk  = 
x»  with  g(x *)  /  0.  Let  {x^}  be  any  sequence  such  that  xk  €  T(xk,Hk), 
lim^co  \\xk  —  Xfc||  =  0.  Then  there  exists  a  subsequence  {xk}  —  xkj}  such 
that,  for  this  subsequence 

,.  X  hj  xkj  5,(^'*) 

\\xk]  -  xkj\\  ||s,(;c*)|| 


Proof: 

Let  {Hk}kei<1  be  a  convergent  subsequence  of  {Hk}.  Then  for  some  H , 


lim  Hk  =  H,  \\H\\<B. 

keK  i  ii  ii  — 

For  k  €  K\  let  us  write 

Hk  =  QkDkQl  ,  (2.3) 

where  Dk  =  diag(Ai(L4), . . . ,  \n(Hk)),  \\{Hk)  <  •••  <  A  n{Hk).  By  the  con¬ 
tinuity  property  of  eigenvalues  (see  Wilkinson,  [12]  pg.63  or  Ostrowski  [9] 
pg.225),  we  have: 

lim  A i(Hk)  =  A i(H) ,  i  =  1, . . . ,  n 
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where  A i{H),  i  —  1, . . .  n  are  the  eigenvalues  of  H  in  increasing  order.  Now, 
the  matrices  {Qk}  kei<i  are  contained  in  a  compact  set  of  IRn x n .  Therefore, 
there  exists  a  convergent  subsequence  {Qk}keK2i  C  K\  such  that 

lim  Qk  =  Q  , 

kei\2 

and  Q  is  an  orthogonal  n  x  n  matrix.  Hence,  taking  limits  in  (2.3)  for  k  E  if2, 
we  have: 

H  =  QDQt  , 

where  D  =  diag(A1(if), . . . ,  Xn(H)),  Q  =  (w4, . . . ,  un).  Now,  tf(x*)  ^  0,  so 
there  exists  m  6  {!,...,«■}  such  that 

7^  0  .  (2.4) 


Therefore,  there  exists  g  >  —  \\  such  that 

\g(xt,)Tvm\  >  7 /  1  min  (  ^ 

Am  +  g  2  2  \  Am  —  Ax  J 

Hence,  taking  limits  for  &  E  K2,  we  have,  for  large  enough  k  €  /T2 , 

ig(st)r«ii,i  >  2 

^  4 


(2.5) 


(2.6) 


But, 


||-(ft  +  ,/)-V(xt)||>i^&Hi. 

Am[Hk)  +  V 

Therefore,  for  large  enough  k  E  /T2,  by  Lemma  2.1  and  (2.6)  there  exists 
Zk  €  Ti(xk,Hk)  such  that 

IN  -  *11  =  \  ■  (2-7) 

Hence,  since  lim^oo  ||x*  —  x/t||  =  0,  Lemma  2.1  and  (2.7)  imply  that  27  E 
Ti(xk,Hk)  for  large  enough  k  E  K2  (say,  k  E  K3). 


We  now  want  to  prove  that  lim^co  gk  =  00.  We  proceed  by  contradiction. 
Assume  that  gk  <  go  <  00  for  E  /L4  C  A3.  Then,  Xk  E  Fi(xfe,/7fc)  for 
k  E  A4,  so  for  Qk  =  (rf, . .  .,«*), 

||®fc  -^fcll2  =  ||  -  (Lffc  +  ^/)_1fl'(®fe)l|2  =  II -<5fc(L?fc  +  /Xfc/)_1Qfc5(^fc)l|2 

=  ||(Dfc +  Mfc/)_1Qr5,(3;fc)ir 

(  9{xkfvl  \\  .(  V 

\Ai  (Hk)  +  gkj  \A  n{Hk)-\-gkJ 


( g{xk)Tvi  y  (  g(xk)Tvkn  y 

\^i{Hk)  +  go)  \  A„(i7fc)  +  go) 


(2.8) 
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But  the  limit  of  the  right-hand  side  of  (2.8)  when  k  oo  is  clearly  a  nonzero 
positive  number,  therefore  ||x*,  —  || 2  is  bounded  away  from  zero  if  k  £  I{4  is 

large  enough,  contradicting  the  hypothesis.  Hence,  lim^/q,  pk  =  oo.  There¬ 
fore,  we  may  write 

~{Hk  +  nkI)~1g(xk) 

||  -  (Hk  +  fikI)-1g(xk) || 

~(Hk/pk  +  I)~1g(xk) 

\\(Hk/gk  +  I)-'g(xk)\\  ’ 

and  the  thesis  follows  for  the  subsequence  indexed  by  K3  using  boundedness 
of  { Hk }  and  \imkeK3  pk  =  oo.  □ 

Now  we  are  able  to  prove  the  following  global  convergence  theorem.  Note 
that  we  do  not  assume  that  V2/(xa,)  exists,  much  less  that  Hk  approximates 
it  well. 


Xk  ~  xk 

Xk  -  Xfc|| 


Theorem  2.4  Assume  that  \\Hk\\  <  B  for  k  =  0,1,2,...,  x0  £  Q,  and 
Xfc+i,  k  =  0,1,2,...  is  obtained  from  Algorithm  2.1.  Let  x*  £  fl  be  a  limit 
point  of  {xa,}.  Then  g(x *)  =  0. 


Proof: 

Assume  that  x»  6  fl,  x*  =  lim^g/Cj  xk  and  g(x *)  ^  0.  We  consider  two 
possibilities: 

(a)  Some  subsequence  of  {||xa,+1  —  xk\\}k€K1  is  bounded  away  from  0. 

(b)  \imkeKl  ||xfc+1  -  Xfc||  =  0. 


Using  Lemma  3.2  of  Powell- Yuan  [10],  we  see  that 


g(xk)T(xk+ i-xfc)  < 


||g(xA)||2||xfc+i  -  Xfcjj  ||ff(xfc)||2||xfc+i  xk | 

2||tffc||  ||x*+i  -  xfc||  +  ||sr(xfc)||  ~  2BA  +  ||y(x*)|| 


Hence,  if  (a)  holds,  using  (2.1)  and  the  continuity  of  V/  at  x„,  we  see  that 
limfc_KX)  f(xk)  —  — oo.  This  contradicts  the  assumption  x*  £  f l. 

Therefore,  it  remains  to  analyze  (b).  Since,  in  Algorithm  2.1,  xk+1  is  set  to 
x  which  is  chosen  such  that  ||x-xfc||  >  (32Ak/2,  it  follows  that  lim^Ay  Ak  =  0. 
We  consider  two  possibilities: 


(i)  For  some  I<2  C  Ad,  limjtg^  Ak  =  0. 

(ii)  For  every  K3  C  Ad,  UmkeK3  Ak  ±  0. 
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If  (i)  holds,  then  we  can  assume  for  k  €  K2  that  A \(Hk)  >  0  since  otherwise 
Ajt  =  A.  Thus  A k  is  set  in  Algorithm  2.1  to  be  the  minimum  of  A  and 
i|| Hklg(xk)\\,  and  it  follows  that  limfce^2  ||  -  H£g(xk)\\  =  0.  But 

M*k)\\  <  \\Hk\\  \\H?g(xk)\\  <  B\\H?g(xk)\\  . 

Hence  limk^K2  d{xk)  —  0  and  so,  g(x ,)  =  0,  contradicting  the  initial  assump¬ 
tion. 

Now  consider  (ii).  It  means  that  the  sequence  {Ak)k&K1  is  bounded  away 
from  zero.  Therefore  the  first  trial  point  of  the  algorithm  failed  to  satisfy 
(2.1).  This  is  so  because  Ak  =  Ak  the  first  pass  through  the  DO  loop  at  each 
iteration,  and  our  working  hypothesis  at  this  point  is  limjte.K'i  Ak  =  0.  Thus, 
for  all  iterations  indexed  by  K\,  there  is  at  least  one  failed  trial  point.  Let  us 
set  the  sequence  of  last  failed  trials  to  We  have  that  each  xk  satisfies 

IS2 2Ak  <  —  Xfc||  <  2Ak  . 


It  follows  that 


and 


Hm,  ||*t 


xfc||  =  0 


f(xk)  ~  f(xk)  >  -a\ g{xk)T(xk  -  xfc)|  >  -a||flf(a:*)||  ||xfe  -  xfe||  . 

Hence,  using  the  Mean  Value  Theorem, 

g{xk  -  (k(xk  -  «*))T^ — -4  >  -alb(^)ll  •  (2-9) 

I Ffc  -  xk\\ 

Now  we  are  under  the  hypotheses  of  Lemma  2.3.  So,  taking  limits  on  both 
sides  of  (2.9)  for  a  suitable  subsequence,  we  obtain 

s(x-)T  (iS) £  • 

But  this  inequality  implies  that  a  >  1,  contradicting  the  initial  hypothesis. 
Therefore  the  theorem  is  proved.  □ 


3  Updating 

In  Section  2,  we  used  a  uniform  bound  on  {||i/fc||}  to  obtain  a  global  conver¬ 
gence  result  for  Algorithm  2.1.  Algorithm  3.1  proposes  a  way  of  updating  Hk 
that  under  reasonable  conditions  preserves  uniform  boundedness  of  {||fdfc||} 
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and,  in  addition,  incorporates  second-order  information  using  secant  approxi¬ 
mations. 

Algorithm  3.1 

Let  hi  C  lRnxn  be  a  family  of  symmetric  matrices  uniformly  bounded  in 
norm  by  M.  Let  q  be  a  positive  integer,  9  £  (0,  |)  be  a  small  number,  and 
T  C  IR"X”  be  the  set  of  tridiagonal  symmetric  matrices.  We  now  particularize 
Algorithm  2.1  by  specifying  that  if  k  +  1  =  0  (mod  q),  then  we  choose  Hk+\  € 
7i.  Otherwise,  we  assume  that  Hk  —  QkTkQj,  Tk  £  T,  Qk  orthogonal,  and  we 
obtain  Hk+i  by  the  following  steps: 

Step  1:  Let  s  =  sk  =  Qj(xk+i  —  xk).  If  s  does  not  satisfy 

fii + »?+.  >  »wi  <  A; + •  (3-‘) 

i  =  1, ...  ,7i  —  1,  replace  s  by  any  vector  satisfying  (3.1)  with  ||s||  = 
||xfc+i  —  £fc||  and  Xk  +  Qk$  £  fh  We  used  s  =  e  in  our 

implementation. 

Step  2:  Define?/  =  yk  =  Ql[g(xk+Qks)~g(xk)]-  (Observe that  xk+QkS  = 
Xk+ 1  if  s  was  not  replaced  at  Step  1.) 

Step  3:  Obtain  Tk+ 1  as  the  solution  of  the  problem 

min  \\T-TkWl 

Ts—y 

Ter 

Step  4:  Hk+i  =  Qk+iTk+iQl+i  with  Qk+ 1  =  Qk-  (Of  course,  neither  Hk 
nor  Hk+ 1  need  to  be  formed.) 

The  solution  of  (3)  may  be  obtained  using  the  least-change  theory  for 
updates  by  an  algorithm  which  will  be  described  in  Section  5.  See  for  exam¬ 
ple  Dennis-Schnabel  [3]  Chapter  7.  The  rest  of  this  section  is  essentially  to 
prove  that  the  sequence  of  matrices  obtained  using  Algorithms  2.1  and  3.1 
is  bounded,  and  so,  that  the  global  convergence  Theorem  2.2  holds.  Some 
auxiliary  lemmas  will  be  necessary. 

Lemma  3.1  Let  s  be  such  that  s}  +  s?+l  >  0;  i  =  1, . . . ,  n  —  1,  Define  A  £ 
IRnX(2"-1)  as: 

Sl  s2/V2 

Si/v/2  S2  S3/V 2 

S2/V2  S3  S4/V 2 

S71  —  2/^/2  Sn_j  Snj\f2 

Sn  —  1  /  sn 
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Then  rank  A  —  n. 


Proof: 

Form  AAT  and  note  that  it  is  symmetric  and  strictly  diagonally  dominant. 

□ 


Corollary  3.1  Under  condition  (3.1),  if  s  ^  0,  rank  A  =  n. 

Proof: 

Trivial  using  Lemma  3.1.  □ 

Under  condition  (3.1)  and  s  ^  0,  either  |si|  >  ^jj|s||  or  |sn|  >  4»||s||. 
Let  us  suppose,  without  loss  of  generality,  that  |sn|  >  ^j||s||  (otherwise  the 
following  lemma  may  be  reformulated  in  an  obvious  way). 

Lemma  3.2  Let  fa  be  the  angle  between  the  row  i  +  1  of  A  and  the  subspace 
Si  spanned  by  the  i  first  rows.  Assume  s  =£■  0  and  (3.1).  Then  |  sin  fa\  > 
i  =  l,. . . ,  n  —  1. 


Proof: 

Consider  S[,  the  subspace  of  ]R,(2n_1)  formed  by  the  vectors  of  the  form: 

(^1,  .  .  .  ,  0,  .  .  .  ,  0) 

Obviously,  Si  C  -S',-,  i  =  1, . . . ,  n  —  1. 

Let  fa  be  the  angle  between  the  row  i  +  1  of  A  and  s't.  Then,  |  sin  fa\  > 
|  sin  fa\.  Now  if  i  <  n  —  2,  then 


If  i  =  n  —  1,  then 


|s| 


Q_ 

V2 


so,  I  sin  fa  |  >  |  sin  fa\  >  i  =  l,...,n  —  1. 


□ 


10 


Lemma  3.3  The  product  II  =  n"=i1  |sin/?;|  invariant  under  permutations 
of  the  rows  of  A. 


Proof: 

Set  A 


A 

H 


such  that  A  is  nonsingular.  Suppose  further  that  the  rows 


of  H  are  orthogonal  and  span  the  orthogonal  complement  to  the  rows  of  A. 
Thus  (see  [6]) 

n  =  ^  (3,) 


where  W  is  the  product  of  the  norms  of  the  rows  of  A.  But  the  right  hand 
side  of  (3.3)  is  invariant  under  permutations  of  the  rows  of  A  (and  hence,  of 
A),  so,  the  same  happens  with  II.  □ 


Lemma  3.4  Let  7 i  be  the  angle  between  the  row  i  of  A  and  the  subspace 
spanned  by  the  other  rows  of  A.  Then  |  sin  ^  |  >  II  >  0n~x  2~ . 


Proof: 

Fix  the  row  i  and  permute  the  rows  of  A  so  that  row  i  becomes  the  last 
one.  So  [  sin ^ j  =  |  sin /?n_i |  >  II  =  fl  |sin#|  >  (^)  .  □ 


Lemma  3.5  Let  s  A  0  and  A+  =  AT  (AAT)~1).  Then  A+  €  IR(27l~1)xn.  Let 

(rl\ 


A+  —  (hi, . . . ,  hn).  A  — 


.  Then  ||  hi  ||  < 


\  rn  ) 


_  n  — 1 
2~ 

0»||a|| 


(3.4) 


Proof: 

Each  column  hi  of  A+  is  a  linear  combination  of  77, . . . ,  rn.  Moreover 
hj  ri  —  1  and  hfrj  =  0  if  j  ^  i.  Let  S  be  the  subspace  spanned  by  {ri, . . . ,  rn} 
(and  hence,  by  {hi, . . . ,  hn}).  Each  r,-  may  be  expressed  as 

r,  =  Vi  +  Wi, 

where  77  is  the  projection  of  r,  on  the  subspace  spanned  by  {rj,j  A  A  an<^  Wi 
is  the  projection  of  rt-  into  the  line  spanned  by  hi.  So 


But 


sin  7;  = 


Fill  IFill  > 

INI  ’  INI  - 

n— 1 


'  e  ' 

V2, 


n— 1 


(3.6) 


Thus,  by  (3.5)  and  (3.6)  pjjprjf  >  (7)"  aII<]  hence 


/  e  ' 

« 5  u 


1—71 


r;  . 


But  1 1 r,- 1 1  >  <9||s||,  so 


IIA.il  < 


_  n— 1 
2“ 

0"IW 


(3.7) 

□ 


Lemma  3.6  If  s  ^  0,  then  for  any  norm  |  •  |  fixed  in  IR/2n  ^x",  there  exists 
a  constant  K\  =  K\{\  •  |,0,n)  such  that  |v4+|  <  liLi/||s||. 

Proof: 

Consequence  of  (3.7). 

The  results  above  are  going  to  be  used  in  a  “vector  formulation  of  the 
least-change  update.”  Let  us  write 

a\  h\ 
b\  a\  b\ 


Tk  = 


&n—l  <\ 


(3.8) 


T  = 


bi 

hi  «2  ^2 


hn— 1 

The  least-change  update  is  the  solution  of 

min  \\T-Tk\\2F 

Ts=y 
Te  T 


(3.9) 


(3.10) 


By  (3.8)  and  (3.9),  (3.10)  may  be  formulated  as  follows: 
min  ( ai-a\ )2  +  2(bi-b\f  +  {a2-ak2f  +  ”•  +  2(6„_i -&N)2  +  ( an-aknf 
(  +  biS2  =  Vi 

biSi  +  a2S2  +  b2Ss  =  y2 


s.t.  < 


bn—lSn—1  +  bnsn  yn 


(3.11) 
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Let  us  now  consider  the  isomorphism  between  T  and  Ux  1 ,  which  maps 


«i 

k 

a\ 

bx 

T  = 

b\ 

CL  2  62 

$ 

i - > 

a  2 

bn— 1 

C^n 

bn— 1 

CLji 

=  t 


(3.12) 


We  write  <J>(T)  =  t,  <f>(Tk)  =  t k,  and  so  on.  Therefore,  the  problem  (3.11)  may 
be  written  in  IR2n_1  as: 


(  1 


mint  \\t  -  tk\\a  =  {t  -  tk)TG(t  -  tk)  with  G  = 


\ 


s.t.  Akt  =  y 


1  ) 


where  Ak  = 


(  Si  S2 

Si  S2  S3 


V 


and  s,-  =  ( sk)i  . 


$n— 1  $n  / 


.  ,  (3-13) 

By  Lemma  (3.1),  the  matrix  Ak  =  AkG  2  has  full  rank,  so  by  straightforward 
calculations,  the  solution  of  (3.13)  is 


4+i  —  tk  —  G  *Ak(Aktk  —  y ) 

(3.14) 

where  Ak  is  defined  in  (3.2)  and 

At  =  Al{AkAl)-'  =  G-iAliAkG-'A*)-1. 

(3.15) 

So 

4+i  =  h  -  G~lATk{AkG-lAl)-\Aktk  -  y) . 

(3.16) 

Therefore 

l|4+1||o  <  IIP  -  CT'AUAiG-'aD-'AiMo  +  IIG-Ufrllo  . 

But  (/  —  G_1  A\(AkG~l A^)-1  Ak)tk  is  the  solution  of 

min  ||f  -  tk\\G 
s.t.  Akt  —  0 
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(3.17) 


so  ||(7  -  G  1ATk{AkG  1Aj)  Mfc)4||G  <  ||4||g-  Therefore 

||4+i||g  <  ||4||g  +  \\G~*  Aty\\G  • 


Now, 


I|G-M+s||g  <  ||G-i||0||A+j||0 


and 


\\Aiy\\G  =  \\yih  +  •  ••  +  ynhn\\G  <  |yi|  ||Ai||g  + - b  \yn  ||Mg 

<  lly|l(INIo  +  -  +  IIMo). 

But  jj Tii  || c?  +  •  •  •  +  || Tin || g  defines  a  norm  in  IR/2n  Bx",  so  by  Lemma  3.6, 

Iloilo  < 


and  so 


IIG-U+yllc  <  1<2 


llsll 


Now  we  are  able  to  prove  the  main  result  of  this  section. 
L0  =  {x:  f(x)  <  f(x o)}. 


(3.18) 
□  Let 


Theorem  3.7  Assume  that  Lq  is  bounded  and  contained  in  SI,  f  £  C2(f l),  0 
convex,  and  that  for  some  L  >  0, 


\V2f(x)-V2f(w)\\<L\\x- 


w\ 


(3.19) 


for  all  x,  w  €  0. 

Assume  that  the  sequences  {xk}  and  {Hk}  are  generated  using  Algorithms 
2.1  and  3.1.  Then  the  sequence  {Hk}  is  bounded  by  some  constant  B. 

Proof: 

Since  {x^}  is  generated  by  Algorithm  2.1  and  Algorithm  3.1,  ||s||  =  ||x*;+i  — 
Xfc||,  and  Lemma  2.1  implies  that  ||s||  =  0  only  if  {xyt}  converges  to  a  stationary 
point  in  finitely  many  steps.  Using  (3.19),  we  have 

\\y  -  v2/(xfc)s||  <  ^||s||2 . 

Since  ||V2/(x)||  is  bounded  uniformly  on  L0  by  continuity,  and  since  {xk} 
contained  in  Lq  implies  that  ||s||  is  uniformly  bounded, 

Hull  <  l|v2/(x*)||  Ml  +  f  Ml2  <  K,M 
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for  a  suitable  defined  constant  K3.  If  k  +  1  ^  0  (mod  q ),  then  by  (3.17)  and 
(3-18), 

||**+i||o  <  l|(t||<3  +  KjM  <  ll<klle  +  K2K3  (3.20) 

Ml 

Hence,  by  (3.20), 

||#jh-iII  =  \\Tk+i\\  <  II^+iIIf  =  ||4+i||g  <  ||4||g  +  K2K3 

=  ||Tfc||F  +  I<2 K3  <  \/nj|Tjt|j  +  K2K3  —  y/n\\Hk\\  +  K2K3 
=  (y/n)gM  +  q\fnK2K:i  ■ 

□ 


Corollary  3.2  Under  the  hypothesis  of  Theorem  3.7,  the  sequence  {x^}  is 
well  defined  by  Algorithms  2.1  and  3.1,  and  there  is  at  least  one  limit  point  of 
the  sequence.  Every  limit  point  is  a  stationary  point  for  f . 

Proof: 

Directly  from  Theorem  2.4,  Theorem  3.7,  and  the  compactness  of  Lq. 


4  Local  Superlinear  Convergence 

In  Section  3,  we  proved  that  Algorithm  2.1,  with  the  approximate  Hessian 
matrices  {Hk}  chosen  by  Algorithm  3.1,  is  globally  convergent  in  the  sense  that 
every  limit  point  of  the  sequence  { Xk }  must  satisfy  the  first-order  stationary 
condition.  In  this  section,  we  will  do  two  things  at  once  by  doing  a  local 
analysis  of  the  direct-prediction  method  associated  with  the  tridiagonal  factor 
update  method.  This  means  that  we  will  take  x^+i  =  .  Unhappily, 

the  good  local  behavior  of  this  iteration  imposes  that  Hk  =  V2/(xfc)  if  k  =  0 
(mod  q).  First,  we  will  prove  some  strong  bounded  deterioration  results  for 
{Hk}  which  will  be  crucial  to  our  global  convergence  result  in  Section  5.  Then, 
almost  as  a  side  light  to  the  main  theme  of  this  paper,  we  will  prove  that  the 
direct-prediction  method  is  locally  q-superlinearly  convergent  to  stationary 
points  at  which  the  Hessian  is  nonsingular.  It  will  turn  out  that  this  result  is 
also  useful  in  the  global  analysis  of  Section  5. 

Let  us  define  the  algorithm  under  consideration  in  this  section  as  an  inde¬ 
pendent  algorithm. 

Algorithm  4.1 

Assume  that  xq  €  IR/1,  Ho  —  V2/(x 0).  Given  Xk  €  1R”,  Hk  €  IRnxn, 
Hk  =  QkTkQl ,  Qk  orthogonal,  Tk  €  T,  obtain  xk+i,  Hk+ 1  as  follows: 
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Step  1:  xk+1  =xk-  H£Vf(xk) 

Step  2:  If  k  +  1  =  0  (mod  q ),  set  Hk+ 1  =  V2f(xk+ 1).  Else,  obtain  Hk+ 1 
using  Algorithm  3.1. 

Let  us  state  the  assumptions  on  /  which  allow  us  to  obtain  a  local  superlinear 
convergence  result. 

Assumption  4.1 

Let  /  £  C2( SI),  St  an  open  and  convex  set.  We  assume  that  x*  £  SI  is  such 
that  V/(x»)  is  symmetric  and  nonsingular.  Further,  we  assume  that  (3.19) 
holds  for  all  x,  w  £  SI. 

Let  Pt  denote  the  Frobenius  norm  projection  operator  onto  the  subspace 
of  symmetric  tridiagonal  matrices  T. 


Lemma  4.1  Assume  that  k  =  0  (mod  q)  and  that  xk  €  SI  is  well  defined. 
Then, 


Pr(QkTV2f(x*)Qk)  -  Qj'y2 f{x*)Qk ||f  <  2\/n  L||a;fe  -  a;* 


Proof: 

II  Py(QjV2f(x.)Qk)  -  QkTV2f(x.)Qk  ||F 

<  l|.PrM*TV7(*.XM  -  QkTV2f(xk)Qk ||F  +  || QtTV2f(xk)Qk  -  QkTV2f(x.)Qk ||F  . 

But  QkTV2f(xk)Qk  6  T.  Therefore, 

\\Pr(QkTV2f(x.)Qk)  -  QkTV2f(xk)Qk\\F 

=  \\Px(QkTV2l(x.)Qk)  -  Px(QkTV2J(xk)Qk)\\F 
<  WrTV2/(x.)l3r.  -  QtTV2f(xk)Qt\\F 


Hence,  by  (3.19), 

WQW  !(*.)<!*)  -  QkTV2f(x.)Qk\\F  <  2|| Ol(V2f(xk)  -  V2/(x.))Qt||r 

<  2vS  ||QrT(V2/(it)  -  V2/(x,))Qt|| 
=  2v^||V2/(xt)-V2/(x.)|| 

=  2\/n  L||tEfc  —  x,|| 


□ 

From  now  on,  let  us  use  the  notation  e(  =  \\x(  —  x*||,  £  =  0, 1, 2, _ 
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Lemma  4.2  Assume  that  k  =  0  ( mod  q),  0  <  j  <  q  —  1,  and  that  xk+j, 
Xk+j+i,  Xk+j  +  Sk+j  are  well  defined  and  belong  to  ft.  Then, 

ll^/fe+j  [Pt{Qk  V  /(x»)Qfc)]s/£^.j||  <  L 1 1 skFj 1 1 ( ek^.j  -b  —  e^+j+i  4~  2\Jnek)  . 

Proof: 

llv*+i  -  [Pr(QkTV2f(x«)Qk)}sk+]\\  <  || yk+j  -  QkTV2f(x*)Qksk+j\\ 

+  || Pr(QkTV2f(x,)Qk)  -  QkTV2f(x*)Qk)\\  ||a*+i||  .  (4.1) 

But,  by  (3.19),  and  the  definition  of  yk+j, 

ll?/fc+i  —  Qk  V  f  (x+'jQ kSk+j || 

||s(*Efc+jr  4"  Qk^k+j)  d{xk+j)  V  f  (xfi)Q  kSk+j  || 

—  Y lls*+J’ll  ll^fc+j  ~b  Q k$k+j  ^*11}  •  (4.2) 

Therefore,  by  (4.1),  (4.2)  and  Lemma  4.1, 

II  Vk+j  ~  [ Pr(QkT^2f(xt)Qk)]sk+j\\ 

—  '^‘Il'5fc+jll  max{efc+ii  ll^-fc+j  "b  Qksk+j  ~  a;*||}  "b  2\/n  T||3fc4.j ||e^.  . 

Now,  even  if  sk+j  -fi-  xk+j+ i  —  xk+j,  they  are  equal  in  norm,  so 

||'Efc-f j  Qk^k+j  a'*||  ^  f'k+j  *b  H^fc+jH  *-k+j  T  ^A:+j||  —  2ek+j  +  ^fc+j+i  . 

Therefore, 

\\Vk+j  ~  [Pr{QkTV2f(x*)Qk)]sk+J\\ 

—  T^ll^fc+j ll(2ei-+j  4~  ek+j+ i)  4"  2\/n  I'||sjt+j||e* 

—  IK^fc+j  +  2^^+i+i  4”  2\Ar  ek)  , 

as  we  wanted  to  prove.  □ 

The  following  lemma  states  a  Bounded  Deterioration  Principle  (see  [1])  for 
the  matrices  Tk. 

Lemma  4.3  Assume  that  k  =  0  (mod  q),  0  <  j  <  q  —  2,  and  that  xk+J, 
xk+j+i,  xk+j  +  Qksk+J  are  well-defined  and  belong  to  ft.  Then, 

IIWi  -  Pr(QkTV2f(x.)Qk)\\F 

—  II Pk+j  ~  Pr (Q kT V2 / (x ,)Q *)||f  4-  A2T(efc+j  4-  -ek+j+1  +  2 y/n  ek )  . 
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Proof:  For  matrices  T  E  Y,  remember  that  \\T\\f  =  ||$(T)||g,  where  $  is  the 
isomorphism  which  maps  T  intoIR271-1.  The  matrices  Tk+j+i—Py(QkT'^2f{x*)Qk) 
and  Tk+j  ~  Py  (QkT^2f(x*)Qk)  belong  to  T.  So,  using  the  convention  t  — 

4>(T),  we  are  going  to  prove  the  thesis  in  IR2'1-1  using  ||  •  ||g- 

By  (3.16)  we  have,  writing  y  =  yk+j,  A  =  $(PT{QkTV2 f(x*)Qk)), 
tk+j+i  —  tk+j  ~  G  Ak+J(Ak+JG  Ak+j)  (Ak+jtk+j  —  y).  So, 

tk+j+i  t *  =  tk+j  A  G  Ak+:j(Ak+jG  (Ak+jtk+j  —  y ) 

=  tk+j  —  A  G  AkJrj[Ak+jG  Ak+f)  (Ak+jtk+j  ~~  Ak+jt*  +  Ak+jt*  —  y) 

=  [I  ~  G~x Al+j(Ak+j G~x Ak+j) (tk+j  -  A)  +  G-'Al+jiAwCr'Al+jr'iy  -  A*+; 

Hence, 

II  tk+j+l  ~  t*  || G 

<  ||[7  -  G-Mf+j(/li+iG-Mf+))-MWj](4+)  -  t.)||o  +  \\G-'ATk+J(AHia-'ATHj)-\y  - 

<  l|4+J  -  f.llo  +  \\G-lAlH(AwG-'ATHl)-'(y  -  A»jt.)\\0  . 

Therefore,  using  the  arguments  which  lead  to  (3.18),  we  have: 

||4+j+i  -  t*\ \g  <  ||4+i  -  4|| a  +  ■ 

But  Ak+jt*  =  Py(QkTV2f(x*)Qk)sk+j-  Thus,  the  desired  result  follows  using 
Lemma  4.2. 

Lemma  4.4  Assume  that  k  =  0  (mod  q),  0  <  j  <  q  —  2,  and  that  xk+j, 

Xk+j+i,  xk+j  +  QkSk+j  are  well-defined  and  belong  to  0.  Then, 

\\Tk+j+i  ~  QkT^2 f{x*)Qk\\F 

—  \\Tk+j  —  Q kT V2 f  {x*)Q k\\F  +  L(K 2^k+j  +  — 1  +  2 y/n  (A2  +  1  )ek)  . 


Proof:  By  Lemmas  4.1  and  4.3,  we  have: 

II  Tk+j+i  —  QkT^2 f(x*)Qk  ||f 

<  ||Tfc+i+1  -  PT(Q^V2/(x,)gfc)||F  +  \\Py(QkTV2 f{xfi>Qk)  -  QkTV2f(x*)Qk)\\F 

—  Il^fc+i  —  Py{QkT^2 f{x*)Qk)\\F  +  A2 L(ek+j  +  —ek+j+i  +  2 \fn  ek )  +  2 Lek^/n 
and  the  desired  result  follows  trivially  from  this  inequality.  □ 

Lemma  4.5  Assume  the  hypotheses  of  the  previous  lemmas.  Then, 

II Tk  -  QkTV2f(x.)Qk\\F  <  y/nLek  ,  (4.3) 
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and  for  0  <  j  <  q  —  2  , 

IIWi  -  QiTV2f(x,)Qk\\F 

3  ft 

<  \fn  Lek  +  ^  L ( I\. 2 ekF 1/  H — —  ek+u+i  +  2-\/n  (K2  +  l)efc)  • 

!/=0  ^ 

Proof: 

lift  -  QkTV2f(x*)Qk\\F=\\QjV*f(xk+J)Qk-QkTV2f(x,)Qk\\F 
<  s/n  ||' V2/(zfc+J)  -  V2/(x*)||  <  \/n  Lek+J  =  y/n  Lek  . 

Thus,  the  desired  result  follows  straightforwardly  from  the  previous  inequality 
and  Lemma  4.4. 

Lemma  4.6  Assume  the  hypotheses  of  the  previous  lemmas,  and  remember 
that 

He  =  QeT(Qj  for  1=  1,2,... 

Then,  for  some  p  >  0 

3+ 1 

|[fft+J+1  -  V2/(*.)||  <  ,(£  et+„). 

j/=0 


Proof:  By  Lemma  4.5, 

llft+i+i  “  V2/(I,)||  =  HQiin+i+i  -  QkTV>f(x,)Qk)Q„T II 

<  11  Tt+i+i  -  o/v2/(x.)o,ii  <  nnt)+1  -  g/v2/(x.)ot||F 

i  ft 

<  \/n  Lek  +  ^  L{K.2^k+v  +  ~7r-efc+i/+i  +  2\/n  (K?  +  l)efc) 


and  the  result  follows  directly. 


□ 


Theorem  4.7  There  exists  e  >  0  such  that  for  any  xq  with  ||a:o  —  x,||  < 
e,  the  sequence  {xe]  generated  by  Algorithm  f.l  converges  q-superlinearly  to 
£*.  Furthermore,  if  eqrj\\ V2/*-1 1|  <  7  <  1,  then  the  sequence  {||ifT1|l}  25 
uniformly  bounded  by  the  constant  B ^  =  ||V2/(x,)_1||/(l  -7)  independent  of 
the  particular  choice  of  x 0. 

Proof:  Algorithm  4.1  is  locally  linear  convergent  and  {||i/T1||}  is  uniformly 
bounded  if  the  matrices  Hk  remain  in  a  suitable  neighborhood  of  V2/(x*). 
(See  [3]  Chapt7).  This  condition  is  easily  verified  using  Lemma  4.6  if  x0  is 
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close  enough  to  x*.  The  reason  this  condition  and  the  bound  on  the  inverses 
can  be  independent  of  the  particular  Xo  is  that  Algorithm  4.1  always  takes 
Ho  =  V2/(xo).  In  particular, 

\\Ht  —  V2/(x*)||  <  tqrj 

and  so  the  bound  B jv  follows  from  the  Banach  Lemma  (See  [3]).  Now,  using 
linear  convergence  and  Lemma  4.6,  we  see  that  lim^co  Hk  =  V2/(x*).  This 
implies  that  convergence  is  g-superlinear  (see  [1])  □. 


5  Global  Superlinear  Convergence 

In  Section  3,  we  proved  that  Algorithm  2.1,  with  the  approximate  Hessian 
matrices  {Hk}  chosen  by  Algorithm  3.1,  is  globally  convergent  in  the  sense 
that  every  limit  point  of  the  sequence  {x*,}  is  a  first-order  stationary  point.  In 
Section  4,  we  proved  that  if  we  require  the  Hessian  update  method  to  always 
choose  Hk  =  V2/(xfc)  every  q  iterations,  then  the  direct-prediction  method  is 
locally  q-superlinearly  convergent  to  stationary  points  at  which  the  Hessian  is 
nonsingular.  In  this  section,  we  put  all  this  together.  We  update  the  Hessians 
approximations  as  in  Section  4,  and  we  modify  Algorithm  2.1  to  always  try 
the  full  quasi-Newton  step  first  when  Hk  is  positive  definite.  We  then  prove 
that  if  /  is  quasi-convex  on  L0  and  V2/(x*)  =  V2/(x«)  is  positive  definite  for 
some  stationary  point  x*,  then  from  some  point  on,  the  Newton  steps  satisfy 
the  sufficient  decrease  condition  (2.1). 

Algorithm  5.1 

Assume  that  x0  €  IR",  H0  =  V2/(x0).  Given  xk  6  JR",  Hk  €  lR”xn, 
Hk  =  QkTkQf,  Qk  orthogonal,  Tk  G  T,  obtain  {x/t+i},  {Hk+ x}  as  follows: 

Step  1:  If  Hk  is  positive  definite,  then  in  Algorithm  2.1,  first  try  xk+i  = 
Xk  -  H i'Vf(xk). 

Step  2:  If  k  +  1  =  0  (mod  q ),  set  Hk+i  =  V2f(xk+i).  Else,  obtain  Hk+ 1 
using  Algorithm  3.1.  Return  to  Step  1. 

Now  we  give  our  main  result.  We  assume  that  /  is  quasi-convex,  ie,  that  all 
level  sets  of  /  are  convex. 

Theorem  5.1  Let  f  €  (72(H),  an  open  and  convex  set  containing  L0,  be 
a  quasi-convex  function  on  Lo ■  Assume  that  Lq  is  bounded,  and  that  some 
stationary  point  x*  €  H  is  such  that  V2/(x*)  is  positive  definite.  Further, 
assume  that  the  Lipschitz  condition  on  the  Hessian  given  by  (3.19)  holds  for 
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all  x,w  G  fl.  Then,  there  exists  some  integer  k jv  such  that  Algorithm  5.1  takes 
gk  —  for  k  >  kjv,  {x^,}  converges  q-superlinearly  to  x»  which  is  the  global 
minimizer  of  f. 


Proof: 

Since  /  is  quasi-convex  and  has  a  stationary  point  x»  at  which  V2/(x»)  is 
positive  definite,  x*  must  be  the  unique  stationary  point  for  /  on  L0,  and  the 
global  minimizer  of  f . 

Since  Lq  is  bounded  and  V2/  is  continuous,  we  can  take  H  =  {V2/(x)  : 
x  €  Lq).  Thus,  from  Corollary  3.2,  we  have  that  {xfc}  is  well  defined  and 
some  subsequence  converges  to  a  stationary  point,  which  must  then  be  x*. 
Furthermore,  there  is  some  B  >  ||i/fc||  uniformly  in  k.  Since  x*  is  the  only 
possible  limit  point  of  {x*,},  the  compactness  of  L0  ensures  that  limkXk  = 
x*.  In  particular,  the  subsequence  of  the  iterates  indexed  by  k  =  0  (mod  q ) 
converges  to  x*. 

The  key  to  the  proof  will  be  to  show  below  that  eventually,  starting  at  one 
of  the  k  =  0(mod  q )  iterates,  Algorithm  5.1  reduces  to  Algorithm  4.1,  i.e.,  the 
step  sk  —  -Hfrgk  eventually  satisfies  (2.1). 

Let  t  be  small  enough  that  Algorithm  4.1  is  locally  q-linearly  convergent  to 
x,  from  any  with  ||x^-x*||  <  e.  Now,  let  BN  be  as  in  Theorem  4.7.  Choose 
e  even  smaller  if  neccessary  to  make  1  —  2a  >  (qg  +  L)Bnc.  The  standard 
approach  to  proving  Theorem  4.7  makes  e  be  chosen  so  that  if  V2/(x»)  is 
positive  definite  then  so  are  all  Hk  for  \\xq  —  x*||  <  e.  Choose  =  0  (mod 
q)  so  that  if  k  >  k jv,  then  ||x*:  —  x*||  <  e. 

There  are  still  a  couple  of  small  points  to  deal  with  before  we  start  to  chain 
inequalities.  First,  since  Hk  is  positive  definite,  we  have  gksk  <  0,  and 


_7V  1 1 2 


(HAgk)THpHAst  <  \\(Hk)-'\\(HA gk)T llA gk  <  -||(fftr‘  fal‘t 


Furthermore,  xj^,xf  +  sk  are  both  within  e  of  x*.  Thus,  any  convex  combi¬ 
nation  is  also,  and  so  for  any  £  €  (0, 1),  ||xf  +  —  x, j|  <  e. 

Now  the  proof  that  {xk  }  =  {xfc}  for  k  >  kN  is  by  Taylor’s  Theorem  and 
all  these  partial  results.  It  can  be  done  by  induction,  but  we  give  only  the 
main  step  here.  Assume  that  the  sequences  are  identical  from  the  k^th  to  the 
Ith  iterate.  Then  H(  —  H^_kN,  and 

/(*,  +  <?)  - /<  =  »(T^  +  i(»f)TiVV(x(  +  ^-x,)±vV(x,)]5(" 

=  -  *.)  ±  V2/(z.)  -  H<Ut 

<  \slsi  +  + ii]4si\l2 
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<  \gj *e  -  \[L  +  qt]\eBNgJ s? 

<  ^gj $  ~  ^(1  -  2o t)gj sf  =  agj sf 

since  Hi  is  positive  definite  and  so  gjsf  <0.  □ 


6  Implementation 

6.1  Implementation  of  steps  4  and  5  of  Algorithm  2.1 

Considering  sk{g)  =  —  (Hk  +  gl)~1g{xk)  with 

g  >  g  =  max(0,  — Ai  -fi  e) 

where  Ai  is  the  least  eigenvalue  of  Hk  and  e  =  10-5  in  the  computer  imple¬ 
mentation,  we  choose 

•E/s+l  —  4 

where  /x,  is  an  approximate  solution  to  the  problem 
(I)  arg  min/^fc  +  sk(g))  g  >  fi  . 

In  order  to  solve  this  problem  it  is  necessary  to  follow  the  curvilinear  path 
Sfc(/x),  /x  >  /x,  and  therefore  to  find  the  solution  of  the  linear  system  of  equations 

{Hk  +  gl)sk{g)  =  -g{xk) ,  g  >  p, 

for  several  trial  values  of  g.  These  computations  are  carried  out  in  0(n ) 
operations  because  the  decomposition  Hk  =  QkTkQkT  is  available.  This  is 
because  we  can  write  the  equivalent  system 

{Tk  +  gl)sk{g)  =  -g{xk) 

where  sk(g)  =  Qk7  sk(g),  g(xk )  =  QkTg{xk). 

The  least  eigenvalue  of  Tk  is  obtained  by  means  of  the  IMSL  routine 
EQRT1S,  and  the  solution  of  the  tridiagonal  systems  by  the  LINPACK  routine 
SGTSL. 

For  solving  (I)  we  modified  the  routine  GSRCH  originally  written  by  M.  J.D. 
Powell  for  MINPACK  [10]. 

The  new  iterate  xk+i  is  accepted  (Step  5  —  Algorithm  2.1)  only  if  the 
condition 

f(xk+ 1)  <  f{xk)  +  ag(xk)T(xk+1  -  xk) 
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is  satisfied  with  a  =  10  4.  However,  we  may  continue  searching  even  if  the 
Newton  step  satisfies  this  criterion. 

We  decide  that  T2(xk,Hk)  is  not  empty  if  the  angle  between  gk  and  is 
between  85°  and  and  95°. 


6.2  Choosing  the  sequence  Bk 


For  those  iterations  in  which  Hk  —  V2/(x^),  the  decomposition  is  computed 
with  the  IMSL  routines  EHOUSS  and  EHOBS  excepting  when  the  Hessian 
itself  is  tridiagonal. 


The  stopping  condition  is  (7.2.5)  page  160  of  Dennis-Schnabel  [3] 

f  lV/(gfc).jmajc(|j;f|,l) 

1  max(|/(x*)|,l) 


max 

l<t<n 


<  eps 


(eps  =  10  15  in  the  computer  implementation.) 


6.3  Efficiency 


The  computer  program  allows  the  user  to  compute  the  full  decomposition  every 
q  iterations  (we  use  q  —  3)  or  to  decide  when  to  do  so  in  between  automatically 
depending  upon  the  following  notion  of  efficiency  of  an  iteration.  We  define 
efficiency  of  the  kth  iteration  as 

e, 


where 


Tk  = 


fk+l  ~  f* 


fk-f.  ’ 

/*  being  an  estimation  of  /(a;*),  fk+ 1  =  f(xk+x),  tk  is  the  CPU  time  required 
by  the  kth  iteration. 

Assuming  rk  remains  constant  until  convergence  (denoted  by  r  hereafter), 
the  required  number  of  iterations  NITER  is  approximately  given  by 


rNITER  =  epg 


Therefore,  the  total  CPU  time  T  will  be 

_  log  eps  _  log  eps 
log  r  k  Ek  ' 

In  order  to  decide  what  Hk+i  will  be  (that  is  Hk+i  —  V2/(xfc+i)  or  Hk+ 1  = 
QkTk+iQkT )  we  use  Ek  as  follows.  Let  k0  be  the  last  iteration  such  that 
Bk0  =  V 2f{xk0 )•  If  k0  =  k  (mod  q)  or  if  Eko  >  Ek  then  Hk+i  =  V2/(xi,+1). 
Otherwise  Hk+1  =  QkTk+iQkT . 
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7  Numerical  Experience 


The  class  of  algorithms  described  in  the  previous  sections  form  the  theoretical 
basis  of  subroutine  TRIDI. 

The  decision  about  when  T2  is  not  empty  is  taken  according  to  a  user- 
supplied  parameter  defining  a  maximum  deviation  in  degrees  with  respect  to 
orthogonality.  This  parameter  was  defined  as  5  degrees  for  the  numerical 
experiments. 


7.1  Test  Problems 


In  order  to  demonstrate  the  effectiveness  of  the  new  method,  numerical  results 
were  obtained  not  only  for  well-known  test  examples  appearing  in  the  litera¬ 
ture  but  also  for  some  new  functions.  For  brevity,  the  full  details  of  the  test 
problems  are  not  given  here  except  for  the  following  new  ones: 

TEST  FUNCTION  PRUEBA 

f{x)  =  a{l)/x(l)  +  a(2)/x(2)  +  a{ 3)/x(3)  +  0.5(x,  Cx)  +  (b,x) 

where  b(i)  =  1.  x  10“6  *  a(i)  —  (*  +  4)  *  1.  x  103  for  i  =  1, . . . ,  3  ,  a  is  as 
defined  in  Table  1,  and 


C  = 


(  1/3 
1/10 
V  i/io 


1/10  1/10  \ 

1/4  1/10 

1/10  1/5 


The  underlying  idea  is  that  if  a  starting  point  is  close  to  the  origin,  the 
“wavy  behaviour”  of  the  function  leads  to  a  very  small  trust  region,  a  phe¬ 
nomenon  which  leads  to  a  rather  inefficient  performance  of  the  classical  method. 
This  shortcoming  does  not  exist  for  the  new  algorithm  because  of  the  curvilin¬ 
ear  search,  which  can  be  considered  as  a  way  of  computing  an  optimal  radius 
in  each  iteration. 

TEST  FUNCTION  SNLLSQ  I 

Generate  data  (j,  y(j))  for  j  —  1, . . . ,  15  from 

y(j)  =  a(l)  *  j**xopt(  1)  +  a( 2)  *  j**xopt( 2)  +  a(3)  *  j**xopt( 3) 

with  a(l)  =  3,  a( 2)  =  3.1,  a(3)  =  0.7,  xopt(  1)  =  1.5,  xopt(2)  = 
2.5,  xopt( 3)  =  —2.5. 

Now  with  the  given  a,  recover  x  by  a  least-squares  fit  to  this  data. 

TEST  FUNCTION  SNLLSQ  II 

Generate  data  ( j,y{j ))  for  j  =  1, . . . ,  15  from 
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y(j)  =  a(l)  *  sin(j  *  xopt(  1))  +  a( 2)  *  sin(j  *  xopt( 2))  +  a( 3)  *  sin(j  *  xopt( 3)) 

with  a(i),  xopt(i),  i  =  1, . . . ,  3  as  in  SNLLSQ  I.  Again,  recover  x  by  least 
squares. 

TEST  FUNCTION  SNLLSQ  III 

Generate  data  (j,  y(j))  for  j  =  1, . . . ,  30  from 

y(j)  =  a(l)  *cos(j  *:ropt(l))  +  a(2)*cos(j  *  xopt(2))  +  a(3)  *  cos(j  *  xopt(3)) 

with  a(l)  =  10,  a(2)  =  20,  a( 3)  =  30,  xopt{  1)  =  0.1,  xopt{ 2)  = 
0.2,  xopt( 3)  =  0.3. 

Recover  x  by  least  squares. 

TEST  FUNCTION  SNLLSQ  IIV 

Generate  data  (j,  y(j))  for  j  =  1, . . . ,  45  from 

y(j)  =  <z(l)*exp(j  *£opt(l))  +  «(2)*exp(j*:ropf(2))  +  a(3)*exp(j*£op£(3)) 

with  a(l)  =  1,  a(2)  =  2,  a(3)  =  3,  xopt{  1)  =  -0.1,  xopt( 2)  =  -0.2,  xopt( 3) 
-0.3. 

Now  recover  x  by  least  squares. 

From  here  on  we  use  the  notation  tfn.n.cn.sp,  where  tfn  is  the  test  function 
number,  n  the  number  of  variables,  cn  the  case  number  and  sp  the  identification 
of  the  starting  point. 

The  following  table  defines  the  problems: 
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Table  1 


tfn 

Name 

n 

cn 

sp 

1 

Prueba 

3 

1:  a(i )  =  l.d  —  1 

1:  (l.d  -  3,  l.d -3,  l.d -3) 

1 

3 

2:  <z(l)  =  l.d3 

a{  2)  =  a(3)  =  l.dO 

2:  (0.25,  0.25,  0.25) 

1 

3 

3:  a(l)  =  a(2)  =  a(3)  =  l.dl 

2 

Penalty  I 

4 

1 

II 

2 

[3] 

8 

1 

3 

Variable 

Dimensioned 

4 

1 

1:  x(j)  =  1  -  j/n 

3 

[3] 

5 

3 

8 

3 

12 

4 

Rosenbrock 

4 

1 

1:  x(2j  -  1)  =  -1.2,  x(2 j)  =  1 

4 

[3] 

8 

4 

10 

4 

12 

5 

Chained 

Rosenbrock 

[3] 

25 

1 

1:  x(j)  =  -1 

6 

Powell 

Extended 

4 

1 

1:  x(4j  —  3)  =  3,  x(4 j  —  2)  =  —  1 

6 

[3] 

8 

1 

x(4j  -  1)  =  0,  x(4 j)  =  1 

6 

240 

1 

6 

400 

1 

7 

Brown-Dennis 

4 

1 

1:  (25, 5, -5,1) 

8 

Gaussian 

[3] 

3 

1 

1:  (0.4, 1,0) 

(continued) 
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(continued) 


tfn 

Name 

n 

cn 

sp 

9 

Trigonometric 

25 

1 

1:  x(j)  =  1 

9 

[5] 

50 

9 

100 

9 

200 

10 

Watson 

[3] 

12 

1 

1:  x(j)  =  0 

11 

Wood 

4 

1 

1:  (-3,-1,  -3,-1) 

[3] 

12 

Box 

3 

1 

1:  (0,10,20) 

[3] 

13 

Biggs  Exp  6 

[3] 

6 

1 

1:  (1.2, 1,1, 1,1,1) 

14 

Dennis-Marwil  I 

10 

8 

II 

S- 

r-H 

II 

r-H 

1:  x(j )  =  -1 

[2] 

kl  =  k3  —  1;  k2  =  5 

2:  rl  =  1;  r2  =  n 
kl  =  4;k2  =  k3=l 

100 

2 

15 

Dennis-Marwil  II 
[2] 

5 

1 

r*H 

1 

II 

■ 

TT 

i-H 

16 

Pseudo  Penalty 

[3] 

50 

1 

1:  x(j)  =  0 

17 

SNLLSQ  I 

3 

1 

1:  x(j)  =  3.50  *  xopt(j) 

18 

SNLLSQ  II 

3 

1 

1:  x(j)  =  1.15  *  xopt(j) 

19 

SNLLSQ  III 

3 

1 

1:  x(j)  =  1.50  *  xopt(j) 

20 

SNLLSQ  IV 

3 

1 

1:  z(j)  =  3.00  *  xopt(j) 
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7.2  Numerical  Results 


The  following  table  gives  the  obtained  numerical  results  using  the  notation: 


NIT  = 

number  of  iterations 

FE  = 

number  of  function  evaluations 

GE  - 

number  of  gradient  evaluations 

HE  = 

number  of  Hessian  evaluations 

T  = 

relative  CPU  time  with  respect  to  the  IMSL  routines 

FMIN  = 

Computed  minimum 

For  each  problem  three  sets  of  results  are  given;  the  first  row  corresponds 
to  the  routine  DUMIAH,  (trust  region  algorithm),  the  second  and  third  to  the 
new  method  with  efficiency  and  without  efficiency  respectively.  For  the  last 

four  test  problems  the  first  row  corresponds  to  the  results  obtained  with  the 
routine  DUMIDH.  Error  6  in  DUMIAH  means  that  five  consecutive  steps  have 
been  taken  with  the  maximum  step  length. 

The  computational  tests  were  carried  out  in  double  precision  on  a  Hewlett- 
Packard  9000  825S  computer  using  software  written  in  Fortran  77  under  the 
HP-UX  operating  system  and  on  an  IBM  4361.  The  reason  for  using  two 
different  computers  was  mainly  that  the  efficiency  idea  is  quite  sensitive  to  the 
precision  with  which  the  CPU  time  is  measured.  Due  to  the  fact  that  timing 
routines  like  the  one  provided  in  the  IMSL  Library  or  others  available  for  UNIX 
systems  do  not  fulfill  the  accuracy  requirements  in  the  sense  that  different  runs 
of  the  same  problem  may  give  unacceptable  differences  for  our  purposes,  some 
of  the  small  size  problems  were  run  on  an  IBM  computer  for  which  the  staff 
of  the  University  of  LaPlata  Computer  Center  wrote  a  very  precise  assembler 
routine  for  measuring  CPU  time.  For  several  reasons,  it  was  not  feasible  to 
run  all  examples  on  that  computer,  so  most  of  the  results  are  from  the  HP 
machine.  In  order  to  normalize  comparisons,  all  results  are  given  relative 
to  the  CPU  time  required  by  the  IMSL  optimization  routines  except  in  the 
examples  in  which  they  failed  to  converge  properly.  All  comparisons  of  the  new 
method  have  been  made  against  the  trust  regions  algorithm  as  implemented 
in  subroutine  DUMIAH  of  the  IMSL  Library  (version  1.0,  April  1987),  with 
the  only  exception  of  the  separable  nonlinear  least  squares  problems  for  which 
subroutine  DUMIDH  was  used  because  a  finite-difference  Hessian  was  required. 
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Table  2 


Problem 

NIT 

FE 

GE 

HE 

T 

FMIN 

1.3.1. 1 

18 

33 

19 

18 

1.00 

— ,13e  +  09 

11 

12 

12 

5 

0.29 

—  .13e  +  09 

13 

14 

14 

4 

0.33 

— .  13e  T  09 

1.3. 1.2 

12 

14 

13 

12 

1.00 

—  ,13e  +  09 

5 

6 

6 

2 

0.24 

—  .13e  +  09 

5 

6 

6 

2 

0.22 

—  .13e  +  09 

1.3. 2.1 

23 

24 

error  6 

24  9 

0.09 

—  .13e  +  09 

22 

23 

23 

8 

0.06 

— .13e  +  09 

1. 3.2.2 

13 

26 

14 

13 

1.00 

—  ,13e  +  09 

8 

9 

9 

3 

0.26 

—  .13e  +  09 

8 

9 

9 

2 

0.24 

—  .13e  +  09 

1.3.3. 1 

23 

33 

24 

23 

1.00 

— ,13e  +  09 

17 

18 

18 

8 

0.45 

—  .13e  +  09 

18 

19 

19 

5 

0.45 

—  .13e  +  09 

1.3. 3. 2 

12 

20 

13 

12 

1.00 

— .13e  +  09 

5 

6 

6 

2 

0.20 

—  ,13e  +  09 

5 

6 

6 

2 

0.19 

— ,13e  +  09 

2.4.1. 1 

34 

48 

35 

34 

1.00 

0.23e  -  04 

11 

12 

12 

5 

0.50 

0.24e  -  04 

12 

13 

13 

4 

0.33 

0.24e  -  04 

2.8.1. 1 

34 

43 

35 

34 

1.00 

0.54e  -  04 

15 

16 

16 

5 

0.88 

0.57e  -  04 

17 

21 

21 

6 

1.09 

0.57e  -  04 

3.4.1. 1 

10 

11 

11 

10 

1.00 

0.24e  -  27 

12 

13 

13 

5 

1.10 

0.21e  -  30 

12 

13 

13 

4 

1.88 

0.78e  -  12 

3.5.1. 1 

11 

12 

12 

11 

1.00 

0.13e  -  28 

14 

15 

15 

6 

3.79 

0.27e  -  19 

14 

34 

34 

4 

3.74 

0.61e  -  17 

(continued) 
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(continued) 


Problem 

NIT 

FE 

GE 

HE 

T 

FMIN 

3.8.1. 1 

13 

14 

14 

13 

1.00 

0.53e  - 

26 

17 

18 

18 

5 

4.75 

0.22e  - 

24 

16 

18 

18 

6 

4.75 

0.19e  - 

16 

3.10.1.1 

14 

15 

15 

14 

1.00 

0.18e  - 

25 

18 

21 

21 

5 

7.07 

0.15e  - 

14 

18 

19 

19 

6 

6.13 

0.46e  - 

19 

4.4.1. 1 

23 

34 

24 

23 

1.00 

0.55e  - 

20 

31 

50 

49 

14 

1.16 

0.39e  - 

31 

39 

72 

71 

10 

1.45 

0.77e  - 

21 

4.8.1. 1 

23 

34 

24 

23 

1.00 

O.lle  - 

19 

35 

63 

61 

16 

1.85 

0.29e  - 

27 

42 

91 

88 

11 

2.21 

0.34e  - 

23 

4.10.1.1 

23 

34 

24 

23 

1.00 

0.14e  - 

19 

36 

75 

73 

12 

1.68 

0.28e  - 

11 

36 

75 

73 

12 

1.46 

0.23e  - 

11 

4.12.1.1 

23 

34 

24 

23 

1.00 

0.16e  - 

19 

38 

87 

84 

13 

1.80 

0.18e  — 

15 

38 

87 

84 

13 

1.78 

0.18e  - 

15 

5.25.1.1 

15 

19 

16 

15 

1.00 

0.14e  - 

13 

19 

51 

49 

7 

0.62 

0.13e  - 

15 

19 

51 

49 

7 

0.56 

0.13e  - 

15 

6.4.1. 1 

15 

17 

16 

15 

1.00 

0.46e  - 

08 

19 

20 

20 

7 

1.10 

0.46e  - 

08 

19 

20 

20 

7 

1.00 

0.47e  - 

08 

6.8.1. 1 

15 

17 

16 

15 

1.00 

0.92e  - 

08 

22 

27 

27 

8 

1.58 

0.63e  - 

08 

22 

27 

27 

8 

1.68 

0.63e  - 

08 

6.240.1.1 

15 

17 

16 

15 

1.00 

0.27e  - 

06 

23 

38 

39 

6 

0.39 

0.93e  - 

06 

20 

39 

40 

7 

0.47 

0.19e  - 

05 

6.400.1.1 

15 

17 

16 

15 

1.00 

0.45e  - 

06 

23 

36 

37 

6 

0.33 

0.16e  — 

05 

(continued) 


30 


(continued) 


Problem 

NIT 

FE 

GE 

HE 

T 

FMIN 

7.4.1. 1 

8 

10 

9 

8 

1.00 

0.86e  +  05 

9 

16 

16 

5 

1.26 

0.86e  +  05 

13 

19 

19 

4 

1.39 

0.86e  +  05 

8.3.1. 1 

1 

4 

2 

1 

1.00 

O.lle  -  07 

2 

3 

3 

1 

0.41 

0.11e-07 

2 

3 

3 

1 

0.47 

O.lle  -  07 

9.25.1.1 

6 

20 

7 

6 

1.00 

— 0.75e  +  04 

9 

22 

22 

3 

0.94 

— 0.75e  +  04 

9 

22 

22 

3 

0.94 

— 0.75e  +  04 

9.50.1.1 

8 

26 

9 

8 

1.00 

-0.31e  +  05 

13 

16 

15 

6 

0.90 

— 0.31e  +  05 

17 

28 

27 

5 

0.91 

— 0.31e  +  05 

9.100.1.1 

17 

39 

18 

17 

1.00 

— 0.12e  +  06 

20 

45 

45 

7 

0.68 

— 0.12e  +  06 

20 

45 

45 

7 

0.58 

— 0.12e  +  06 

9.200.1.1 

23 

43 

64 

35 

1.00 

— 0.50e  +  06 

22 

43 

43 

8 

0.70 

— 0.50e  +  06 

22 

43 

43 

8 

0.72 

— 0.50e  +  06 

10.12.1.1 

12 

26 

13 

12 

1.00 

0.22e  -  07 

22 

52 

48 

8 

0.97 

0.23e  -  07 

22 

52 

48 

8 

0.85 

0.22e  -  07 

11.4.1.1 

12 

26 

13 

12 

1.00 

0.47e  -  09 

12 

59 

56 

7 

0.76 

0.49e  -  07 

12 

61 

57 

8 

1.03 

0.15e  -  07 

12.3.1.1 

7 

14 

8 

7 

1.00 

0.54e  -  16 

10 

14 

14 

4 

1.00 

0.14e  -  11 

10 

14 

14 

4 

0.94 

0.14e  -  11 

13.6.1.1 

29 

60 

30 

29 

1.00 

O.lle  -  11 

33 

52 

46 

13 

0.77 

0.13e  -  12 

53 

85 

77 

14 

1.19 

0.36e  -  12 

(continued) 
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(continued) 


Problem 

NIT 

FE 

GE 

HE 

T 

FMIN 

14.10.1.1 

12 

23 

13 

12 

1.00 

0.29e  - 

- 15 

1 

7 

6 

1 

0.76 

0.23e  - 

-  21 

1 

7 

6 

1 

0.76 

0.23e  - 

-21 

14.100.2.1 

17 

37 

18 

17 

1.00 

0.81e  - 

-  15 

1 

6 

6 

1 

0.16 

0.71e  - 

-  25 

1 

6 

6 

1 

0.16 

0.71e  - 

-25 

15.10.2.1 

12 

23 

13 

12 

0.52 

0.17e  - 

-  15 

1 

10 

10 

1 

0.19 

0.38e  - 

-  22 

1 

10 

10 

1 

0.15 

0.38e  - 

-  22 

15.5.1.1 

4 

6 

5 

4 

1.00 

0.24e  - 

-  13 

5 

6 

6 

2 

1.00 

0.67e  - 

-  12 

5 

6 

6 

2 

1.01 

0.67e  - 

-  12 

16.50.1.1 

100 

111 

101 

100 

1.00 

0.23e  - 

-  03 

27 

73 

70 

8 

0.20 

0.23e  - 

-  03 

35 

87 

86 

9 

0.20 

0.23e  - 

-  03 

In  the  following  nonlinear  least  squares  problems  the  absolute  CPU  time 
is  given  because  of  the  poor  performance  of  the  trust  region  algorithm  which 
led  to  divergence  in  one  example,  a  large  number  of  function  evaluations  in 
another,  and  to  a  very  high  functional  value  in  the  third. 
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Problem 

NIT 

FE 

GE 

HE 

T 

FMIN 

17.3.1.1 

7 

78 

29 

0 

1.73 

0.70e  +  02 

64 

182 

237 

0 

5.91 

0.33e  - 

18 

56 

140 

194 

0 

4.45 

0.35e  - 

12 

18.3.1.1 

13 

35 

divergence 
46  0 

0.72 

0.15e  - 

21 

16 

36 

54 

0 

0.87 

0.92e  - 

25 

19.3.1.1 

26 

84 

105 

0 

2.97 

0.42e  - 

18 

31 

37 

64 

0 

1.61 

0.33e  - 

18 

31 

39 

72 

0 

1.54 

0.42e  - 

22 

20.3.1.1 

4 

19 

17 

0 

0.92 

0.17e  - 

01 

31 

59 

90 

0 

3.08 

0.33e  - 

09 

29 

59 

88 

0 

2.93 

0.93e  - 

07 

The  test  examples  show  the  new  algorithm  to  be  more  robust  (in  fact, 
no  example  of  divergence  has  been  found)  than  the  trust  region  method,  and 
that  its  efficiency  tends  to  increase  with  the  number  of  variables.  This  is  so 
because  of  the  savings  in  Hessian  evaluations,  and  in  spite  of  the  CPU  time 
spent  on  the  computation  of  the  least  eigenvalue  of  the  tridiagonal  factor, 
which  is  relatively  more  important  in  small  size  problems. 


7.3  Comparisons  with  not  Updating 

The  following  are  some  examples  to  show  that  our  update  is  better  than  if 
we  kept  the  Hessian  constant  for  q  iterations.  In  particular,  we  compare  not 
updating  (we’ll  call  this  method  HC)  against  the  method  obtained  updating 
the  Hessian  but  without  the  test  of  Section  6.3.  (WE  =  without  efficiency). 

The  results  of  these  tests  seem  convincing  to  us  that  our  updating  scheme 
is  worthwhile.  This  is  true  despite  the  fact  that  no  stronger  convergence  result 
holds  for  our  updating  scheme  than  for  not  updating. 
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Table  3 


Problem 

NIT 

FE 

GE 

HE 

T 

FMIN 

q 

Method 

1.3.2. 1 

22 

23 

23 

8 

1.00 

— 0.13e  +  9 

4 

WE 

40 

41 

41 

14 

1.39 

— 0.13e  +  9 

4 

HC 

21 

22 

22 

4 

1.00 

— 0.13e  +  9 

6 

WE 

63 

64 

64 

11 

1.73 

— 0.13e  +  9 

6 

HC 

31 

32 

32 

4 

1.00 

— 0.13e  +  9 

10 

WE 

91 

92 

92 

10 

1.62 

— 0.13e  +  9 

10 

HC 

2.8. 1.1 

17 

21 

21 

6 

1.00 

+0.57e  -  4 

4 

WE 

21 

22 

21 

7 

1.20 

+0.57e  -  4 

4 

HC 

15 

33 

32 

3 

1.00 

+0.57e  -  4 

6 

WE 

31 

35 

34 

6 

1.40 

+0.57e  -  4 

6 

HC 

16 

33 

32 

2 

1.00 

+0.57e  —  4 

10 

WE 

41 

43 

42 

5 

1.51 

+0.57e  -  4 

10 

HC 

10.12.1.1 

22 

52 

48 

8 

1.00 

+0.22e  -  7 

4 

WE 

51 

61 

58 

17 

2.31 

+0.43e  -  7 

4 

HC 

37 

98 

92 

7 

1.00 

+0.24e  -  7 

6 

WE 

72 

177 

159 

12 

1.66 

+0.42e  -  7 

6 

HC 

51 

108 

102 

6 

1.00 

+0.43e  -  7 

10 

WE 

96 

260 

232 

10 

1.69 

T0.43e  -  7 

10 

HC 

16.50.1.1 

35 

87 

86 

9 

1.00 

+0.23e  -p  3 

4 

WE 

127 

124 

8 

0.88 

+0.23e  -p  3 

4 

HC 

31 

66 

63 

6 

1.00 

-p0.23e  -p  3 

6 

WE 

51 

184 

183 

9 

1.47 

+0.23e  -p  3 

6 

HC 

25 

52 

49 

3 

1.00 

+0.23e  -p  3 

HC 

49 

162 

161 

5 

1.47 

+0.23e  -p  3 

10 

HC 
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